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HIVHXB2CG 9719 bp SS-RNA VRL 19-AUG-1999 

Human immunodeficiency virus type 1 (HXB2 ) , complete genome ; 
HIV1/HTLV-III/LAV reference genome. 
K03455 M38432 
K03455 . 1 GI : 190 63 82 

TAR protein; acquired immune deficiency syndrome; complete genome; 
env protein; gag protein; long terminal repeat (LTR) ; pol protein; 
polyprotein; proviral gene; reverse transcriptase; transactivator. 
Human immunodeficiency virus type 1. 
Human immunodeficiency virus type 1 

Viruses; Retroid viruses; Retroviridae ; Lentivirus; Primate 
lentivirus group. 

1 (bases 493 to 674; 9577 to 9718) 

Ratner,L., Haseltine, W . , Patarca,R., Livak,K.J., Starcich,B., 
Josephs, S.F. , Doran,E.R. , Raf alski , J . A. , Whitehorn, E . A. , 
Baumeister , K . , Ivanoff,L., Petteway , S . R . Jr., Pearson , M . L . , 
Lautenberger , J. A. , Papas, T.S., Ghrayeb,J., Chang, N.T., Gallo,R.C. 
and Wong-Staal , F . 

Complete nucleotide sequence of the AIDS virus, HTLV-III 

Nature 313 (6000) , 277-284 (1985) 

85111123 

2578615 

2 (bases 1 to 653) 

Starcich,B., Ratner,L., Josephs , S . F . , Okamoto,T,, Gallo,R.C. and 
Wong-Staal , F . 

Characterization of long terminal repeat sequences of HTLV-III 

Science 227 (4686), 538-540 (1985) 

85090465 

3 (sites) 
Allan, J.S . , 
Rosen, C , A. , 



McLane , M . F . , Sodroski , J . G . 
and Essex, M. 



Coligan, J, E . , Barin,F. 
Haseltine, W. A, , Lee, T.H. 
Major glycoprotein antigens that induce antibodies in AIDS patients 
are encoded by HTLV-III 
Science 228 (4703), 1091-1094 (1985) 
85192537 

4 (sites) 

Sodroski, J. , Patarca,R., Rosen, C, Wong-Staal , F . and Haseltine, W. 
Location of the trans -activating region on the genome of human 
T-cell lymphotropic virus type III 
Science 229 (4708), 74-77 (1985) 
85244627 

5 (sites) 

Arya,S.K., Guo,C, Josephs, S.F. and Wong-Staal , F . 
Trans -activator gene of human T- lymphotropic virus type III 
(HTLV-III) 

Science 229 (4708), 69-73 (1985) 
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85244626 

6 (sites) 

van Beveren,C.P. , Coffin, J. and Hughes, S. 
Appendix B: HTLV - 3 / LAV genome 

(in) Weiss,R.L. , Teich,N. , Varmus,H. and Coffin, J . (Eds. ) ; 
RNA TUMOR VIRUSES, SECOND EDITION, 2, Vol. 2: 1102-1123; 
Cold Spring Harbor Laboratory, Cold Spring Harbor (1985) 

7 (sites) 

Rosen, C. A., Sodroski , J . G . and Haseltine, W.A. 

The location of cis -acting regulatory sequences in the human T cell 
lymphotropic virus type III (HTLV- I II /LAV) long terminal repeat 
Cell 41 (3), 813-823 (1985) 
85228232 

8 (sites) 

Rabson,A.B. , Daugherty , D . F . , Venkatesan, S . , Boulukos , K. E . , 
Benn,S.I., Folks, T.M., Feorino,P. and Martin, M. A. 
Transcription of novel open reading frames of AIDS retrovirus 
during infection of lymphocytes 
Science 229 (4720), 1388-1390 (1985) 
85300515 

9 (sites) 

Allan, J. S. , Coligan, J.E. , Lee , T . H . , McLane , M . F . , Kanki , P . J . , 
Groopman , J , E . and Essex, M. 

A new HTLV-III/LAV encoded antigen detected by antibodies from AIDS 
patients 

Science 230 (4727), 810-813 (1985) 
86044509 

10 (sites) 

Rosen, C . A. , Sodroski , J . G. , Goh,W.C, Dayton, A. I., Lippke,J. and 
Haseltine, W.A. 

Post-transcriptional regulation accounts for the trans-activation 
of the human T- lymphotropic virus type III 
Nature 319 (6054) , 555-559 (1986) 
86118720 

11 (sites) 

di Marzo Veronese, F., Copeland, T . D . , DeVico,A.L., Rahman, R. , 
Oroszlan,S., Gallo,R.C. and Sarngadharan, M . G . 

Characterization of highly immunogenic p66/p51 as the reverse 
transcriptase of HTLV-III/LAV 
Science 231 (4743), 1289-1291 (1986) 
86122937 

12 (sites) 

Kan,N.C, Franchini , G . , Wong-Staal , F . , DuBois,G.C, Robey,W.G., 
Lautenberger , J. A. and Papas, T.S. 

Identification of HTLV-III/LAV sor gene product and detection of 

antibodies in human sera 

Science 231 (4745), 1553-1555 (1986) 

86151663 

13 (sites) 

Kramer, R. A., Schaber,M.D. , Skalka,A.M. , Ganguly, K. , Wong-Staal , F . 
and Reddy,E.P. 

HTLV- III gag protein is processed in yeast cells by the virus 
pol -protease 

Science 231 (4745) , 1580-1584 (1986) 
86151671 

14 (sites) 

Lee , T . H . , Coligan, J.E. , Allan, J. S . , McLane ,M.F. , Groopman , J . E . and 
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Essex, M. 

A new HTLV-III/LAV protein encoded by a gene found in cytopathic 
retroviruses 

Science 231 (4745) , 1546-1549 (1986) 
86151661 

15 (sites) 

Dayton,A.I., Sodroski , J. G. , Rosen,C.A., Goh,W.C. and Haseltine , W . A. 
The trans -activator gene of the human T cell lymphotropic virus 
type III is required for replication 
Cell 44 (6) , 941-947 (1986) 
86161683 

16 (sites) 

Sodroski, J. , Goh,W.C, Rosen, C, Tartar, A., Portetelle , D . , Burny,A. 
and Haseltine,W. 

Replicative and cytopathic potential of HTLV-III/LAV with sor gene 
deletions 

Science 231 (4745) , 1549-1553 (1986) 
86151662 

17 (sites) 

Arya,S.K. and Gallo,R.C. 

Three novel genes of human T- lymphotropic virus type III: immune 
reactivity of their products with sera from acquired immune 
deficiency syndrome patients 

Proc. Natl. Acad. Sci . U.S.A. 83 (7), 2209-2213 (1986) 
86177573 

18 (sites) 

Jones, K. A., Kadonaga , J . T . , Luciw,P.A. and Tjian,R. 

Activation of the AIDS retrovirus promoter by the cellular 

transcription factor, Spl 

Science 232 (4751) , 755-759 (1986) 

86179897 

19 (sites) 

Sodroski, J., Goh,W.C., Rosen, C, Dayton, A. , Terwilliger , E . and 
Haseltine, W. 

A second post- transcriptional trans-activator gene required for 

HTLV-III replication 

Nature 321 (6068) , 412-417 (1986) 

86230863 

20 (sites) 

Starcich,B.R. , Hahn,B.H., Shaw,G.M., McNeely , P . D . , Modrow,S., 
Wolf,H., Parks, E.S., Parks, W. P., Josephs , S . F . , Gallo,R.C. and 
Wong-Staal, F. 

Identification and characterization of conserved and variable 
regions in the envelope gene of HTLV-III/LAV, the retrovirus of 
AIDS 

Cell 45 (5) , 637-648 (1986) 
86218077 

21 (sites) 

Willey,R.L., Rut ledge , R . A. , Dias,S., Folks, T., Theodore, T. , 
Buckler, C.E. and Martin, M. A. 

Identification of conserved and divergent domains within the 
envelope gene of the acquired immunodeficiency syndrome retrovirus 
Proc. Natl. Acad. Sci. U.S.A. 83 (14), 5038-5042 (1986) 
86259728 

22 (bases 8761 to 9060) 

Fisher, A. G. , Ratner,L., Mitsuya,H., Marselle , L . M . , Harper, M.E., 
Broder,S., Gallo,R.C. and Wong-Staal , F . 
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TITLE Infectious mutants of HTLV-III with changes in the 3' region and 

markedly reduced cytopathic effects 
JOURNAL Science 233 (4764) , 655-659 (1986) 
MEDLINE 86261824 
REFERENCE 2 3 (sites) 

AUTHORS Feinberg,M.B . , Jarrett , R . F . , Aldovini,A., Gallo,R.C. and 

Wong-Staal, F. 

TITLE HTLV-III expression and production involve complex regulation at 

the levels of splicing and translation of viral RNA 

JOURNAL Cell 46 (6), 807-817 (1986) 

MEDLINE 87002448 
REFERENCE 24 (sites) 

AUTHORS Lightfoote,M.M. , Coligan, J . E . , Folks,T.M., Fauci, A. S., Martin,M.A. 
and Venkatesan, S . 

TITLE Structural characterization of reverse transcriptase and 

endonuclease polypeptides of the acquired immunodeficiency syndrome 
retrovirus 

JOURNAL J. Virol. 60 (2), 771-775 (1986) 
MEDLINE 87036947 
REFERENCE 25 (sites) 

AUTHORS Wright, CM., Felber,B.K., Paskalis,H. and Pavlakis , G .N. 
TITLE Expression and characterization of the trans -activator of 

HTLV-III/LAV virus 
JOURNAL Science 234 (4779) , 988-992 (1986) 
MEDLINE 87042788 
REFERENCE 2 6 (sites) 

AUTHORS Terwilliger,E, , Sodroski , J. G . , Rosen,C.A. and Haseltine,W.A. 

TITLE Effects of mutations within the 3 ' orf open reading frame region of 

human T-cell lymphotropic virus type III (HTLV-III/LAV) on 

replication and cytopathogenicity 
JOURNAL J. Virol. 60 (2), 754-760 (1986) 
MEDLINE 8 7 036943 
REFERENCE 2 7 (sites) 

AUTHORS Goh,W.C, Sodroski , J . G . , Rosen, C. A. and Haseltine , W . A . 

TITLE Expression of the art gene protein of human T- lymphotropic virus 

type III (HTLV-III/LAV) in bacteria 
JOURNAL J. Virol. 61 (2), 633-637 (1987) 
MEDLINE 87112968 
REFERENCE 2 8 (sites) 

AUTHORS Modrow,S., Hahn,B.H., Shaw,G.M w Gallo,R.C, Wong-Staal , F . and 

Wolf ,H. 

TITLE Computer-assisted analysis of envelope protein sequences of seven 

human immunodeficiency virus isolates: prediction of antigenic 
epitopes in conserved and variable regions 
JOURNAL J. Virol. 61 (2); 570-578 (1987) 
MEDLINE 87112 954 
REFERENCE 2 9 (sites) 

AUTHORS Muesing,M. A. , Smith, D. H. and Capon, D.J. 

TITLE Regulation of mRNA accumulation by a human immunodeficiency virus 

trans -activator protein 
JOURNAL Cell 48 (4) , 691-701 (1987) 
MEDLINE 87131081 
REFERENCE 3 0 (sites) 

AUTHORS Nabel,G. and Baltimore, D. 

TITLE An inducible transcription factor activates expression of human 

immunodeficiency virus in T cells 
JOURNAL Nature 326 (6114) , 711-713 (1987) 
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MEDLINE 


87173065 




REMARK 


Erratum: [Nature 1990 Mar 8 ; 344 ( 62 62 ) : 17 8 ] 




REFERENCE 


31 (sites) 




AUTHORS 


Fisher, A. G. , Ensoli,B., Ivanoff,L., Chamberlain, M . , Petteway,S., 






Ratner,L., Gallo,R.C. and Wong-Staal , F . 




TITLE 


The sor gene of HIV-1 is required for efficient virus transmission 




in vitro 




JOURNAL 


Science 237 (4817) , 888-893 (1987) 




MEDLINE 


87292118 




REFERENCE 


32 (sites) 




AUTHORS 


Patarca,R., Heath, C, Goldenberg , G . J . , Rosen, C. A., Sodroski , J . G . 






Haseltine , W . A. and Hansen,U.M. 




TITLE 


Transcription directed by the HIV long terminal repeat in vitro 




JOURNAL 


AIDS Res. Hum. Retroviruses 3 (1), 41-55 (1987) 




MEDLINE 


87299195 




REFERENCE 


33 (sites) 




AUTHORS 


Wong-Staal , F . , Chanda,P.K. and Ghrayeb,J. 




TITLE 


Human immunodeficiency virus: the eighth gene 




JOURNAL 


AIDS Res. Hum. Retroviruses 3 (1), 33-39 (1987) 




MEDLINE 


87299194 




REFERENCE 


34 (bases 1 to 9635; 1 to 9635) 




AUTHORS 


Ratner,L., Fisher , A. , Jagodzinski , L . L . , Mitsuya,H., Liou,R.S., 






Gallo,R.C. and Wong-Staal , F . 




TITLE 


Complete nucleotide sequences of functional clones of the AIDS 






virus 




JOURNAL 


AIDS Res. Hum. Retroviruses 3 (1), 57-69 (1987) 




MEDLINE 


87299196 




REFERENCE 


35 (bases 6225 to 8795) 




AUTHORS 


Reitz,M.S. Jr., Wilson, C, Naugle,C, Gallo,R.C. and 






Robert -Gurof f , M . 




TITLE 


Generation of a neutralization-resistant variant of HIV-l is due 


to 




selection for a point mutation in the envelope gene 




JOURNAL 


Cell 54 (1), 57-63 (1988) 




MEDLINE 


88253426 




REFERENCE 


36 (bases 790 to 2292) 




AUTHORS 


Pal,R., Reitz,M.S, Jr., Tschachler , E . , Gallo,R.C, 






Sarngadharan, M.G. and Veronese , F . D . 




TITLE 


Myristoylation of gag proteins of HIV-1 plays an important role 


in 




virus assembly 




JOURNAL 


AIDS Res. Hum. Retroviruses 6 (6), 721-730 (1990) 




MEDLINE 


90303964 




REFERENCE 


37 (sites) 




AUTHORS 


Ido,E., Han,H.P., Kezdy,F.J. and Tang, J. 




TITLE 


Kinetic studies of human immunodeficiency virus type 1 protease 


and 




its active-site hydrogen bond mutant A2 8S 




JOURNAL 


J. Biol. Chem. 266 (36), 24359-24366 (1991) 




MEDLINE 


92105089 




COMMENT 


On Mar 25, 1997 this sequence version replaced gi: 327742. 






[6] sites; tat mRNA and other transcript boundaries. [7] sites 






tat mRNA. 






[8] sites ; mRNA splice sites. 






[9] sites; 27K antigen cds . 






[5] sites; gpl60 and gp!20 coding sequences. 






[1] sites; regulatory sequences in the LTR. 





[(in) Weiss,R., Teich,N. , Varmus,H. and Coffin, J. (Eds . ) ; RNA Tumor 
Viruses, Secon] review; bases 1 to 9718. 

[15] sites; trans -activator function and TAR sequence. [19] 
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sites / pol coding sequence . 
[22] sites; 23K sor gene product, 
[23] sites; pol NH2-terminal region. 
[20] sites; sor 23K protein. 
[21] sites; sor 23K protein. 

[24] sites; Spl binding sites in the promoter region. [17] sites; 
acceptor and donor splice sites for tat and 27K. [10] sites; 
deletion mutants in the tat gene. 



[18] sites; env gene conserved/varable regions; separate entries. 

[16] sites; trs cds boundaries, 

[12] sites; trs cds boundaries. 

[11] sites; env gene conserved/variable regions; separate entries. 

[26] sites; tar or transactivator target. 

[13] sites; 3' orf mutations. 

[14] sites; pol p34 terminus. 

[31] sites; promoter, TAR, tat- III mutants. 

[32] sites; envelope protein epitopes. 

[33] sites; trs/art protein. 

[34] sites; inducible enhancer element. 

[27] revises [30] . 

[29] sites; long terminal repeat. 

[28] sites; R orf. 

[35] sites; sor. 



Sequence for [25] kindly provided in computer- readable form by 
L.Ratner, 19-AUG-1986 . 

The HXB2 sequence is being used as a reference genome for all the 
HIV entries because it has been derived from a demonstrably 
infectious clone. Hence not all of the 'sites' references above 
were concerned with this isolate. 
FEATURES Location/Qualifiers 
source 1. . 9719 

/organism^ "Human immunodeficiency virus type 1" 

/proviral 

/isolate="HXB2" 

/db_xref="taxon: 11676" 

/note= "HTLV- III/LAV" 
LTR 1. . 634 

/note="5' LTR" 
repeat_region 454.. 551 

/note="R repeat 5' copy" 
mRNA 455 . . 9635 

/product="HXB2 genomic mRNA" 
prim_transcript 455.. 9635 

/note="tat, trs, 2 7K subgenomic mRNA" 
intron 744 . . 5777 

/note="tat, trs, 2 7K mRNA intron 1" 
CDS 790 . .2292 

/note="gag polyprotein" 

/codon__start=l 

/protein__id= "AAB50258 .1" 

/db_xref="GI : 327745" 

/ trans lation= "MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERF 
AVNPGLLETSEGCRQILGQLQPSLQTGSEELRSLYNTVATLYCVHQRIEIKDTKEALD 
KIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQGQMVHQAISPRTLNAWVKWE 
EKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAAEWDRVHPV 
HAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRM 
YSPTSILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTIL 
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KALGPAATLEEMMTACQGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKC 
FNCGKEGHTARNCRAPRKKGCWKCGKEGHQMKDCTERQANFLGKIWPSYKGRPGNFLQ 
SRPEPTAPPEESFRSGVETTTPPQKQEPIDKELYPLTSLRSLFGNDPSSQ" 
CDS 2358 . .5096 

/note="pol polyprotein (NH2 -terminus uncertain) " 
/codon_start=l 
/protein_id= n AAB50259 .1" 
/db_xref ="GI : 1906384 " 

/translation^ "MSLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGP 
TPVNIIGRNLLTQIGCTLNFPISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEIC 
TEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPH 
PAGLKKKKSVTVLDVGDAYFSVPLDEDFRKYTAFTIPSINNETPGIRYQYNVLPQGWK 
GS PAI FQS SMTKI LEPFRKQNPD I VI YQ YMDDLYVGSDLE IGQHRTKIEELRQHLLRW 
GLTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTWDIQKLVGKLNWASQI 
YPG I KVRQLCKLLRGTKALTEVI PLTEEAELELAENRE I LKEPVHGVYYDPS KDLI AE 
IQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKITTESIVIWGKT 
PKFKLPIQKETWETWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDG 
AANRETKLGKAGYVTNRGRQKWTLTDTTNQKTELQAIYLALQDSGLEVWIVTDSQYA 
LGIIQAQPDQSESELWQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL 
FLDGIDKAQDEHEKYHSNWRAMASDFNLPPWAKEIVASCDKCQLKGEAMHGQVDCSP 
GIWQLDCTHLEGKVILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTD 
NGSNFTGATVRAACWWAGIKQEFGIPYMPQSQGWESMNKELKKIIGQVRDQAEHLKT 
AVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQNFRVYYRDSRN 
PLWKGPAKLLWKGEGAWIQDNSDIKWPRRKAKI IRDYGKQMAGDDCVASRQDED » 
CDS 5041. .5619 

/note="sor 23K protein" 
/ codon_start=l 
/protein_id="AAB50260 . 1" 
/db_xref = "GI : 327747 " 

/ trans 1 at ion= "MENRWQVMIVWQVDRMRIRTWKSLVKHHMYVSGKARGWFYRHHY 
ESPHPRISS E VH I PLGDARLVI TT YWGLHTGERDWHLGQG VS I EWRKKRY S TQ VD PEL 
ADQLIHLYYFDCFSDSAIRKALLGHIVSPRCEYQAGHNKVGSLQYLALAALITPKKIK 
PPLPSVTKLTEDRWNKPQKTKGHRGSHTMNGH" 
CDS 5559. .5795 

/note="R (ORF) protein" 
/ codon_start=l 
/protein_id= n AAB50261 .1" 
/db_xref="GI :327748" 

/translation "MEQAPEDQGPQREPHNEWTLELLEELKNEAVRHFPRIWLHGLGQ 
H I YET YGDTWAGVE AI IRILQQLLFI HFQNWVS T " 
CDS join(5831. .6045,8379. .8424) 

/note="tat protein" 
/codon_start=l 
/protein_id="AAB50256 .1" 
/db_xref="GI : 1906383" 

/ trans lation= "MEPVDPRLEPWKHPGSQPKTACTNCYCKKCCFHCQVCFITKALG 
ISYGRKKRRQRRRAHQNSQTHQASLSKQPTSQPRGDPTGPKE" 
exon 5 831 . .6045 

/note="tat protein, first expressed exon" 
/ number=2 

CDS join(5970. .6045,8379. .8653) 

/note="trs protein" 
/codon_start=l 
/protein_id="AAB50257 . 1" 
/db__xref =»GI : 327744 " 

/ trans la t ion = "MAGRSGDSDEELIRTVRLIKLLYQSNPPPNPEGTRQARRNRRRR 
WRERQRQIHSISERILGTYLGRSAEPVPLQLPPLERLTLDCNEDCGTSGTQGVGSPQI 
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LVESPTVLESGTKE " 
5970 . . 6045 

/note="trs protein, first expressed exon" 
/ number =2 
6046 . . 8378 

/note="tat, trs, 2 7K mRNA intron 2" 
6225 . . 8795 

/note= "envelope polyprotein" 
/codon_start=l 
/protein_id="AAB50262 .1" 
/db_xref="GI: 1906385" 

/ trans lation= "MRVKEKYQHLWRWGWRWGTMLLGMLMICSATEKLWVTVYYGVPV 
WKEATTTLFCASDAKAYDTEVHNWATHACVPTDPNPQEWLV^ 

QMHEDIISLWDQSLKPCVKLTPLCVSLKCTDLKNDTNTNSSSGRMIMEKGEIKNCSFN 
ISTSIRGKVQKEYAFFYKLDIIPIDNDTTSYKLTSCNTSVITQACPKVSFEPIPIHYC 
APAGFAILKCNNKTFNGTGPCTNVSTVQCTHGIRPWSTQLLLNGSLAEEEWIRSVN 
FTDNAKTIIVQLNTSVEINCTRPNNNTRKRIRIQRGPGRAFVTIGKIGNMRQAHCN1S 
RAKWNNTLKQIASKLREQFGNNKTIIFKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFN 
STWFNSTWSTEGSNNTEGSDTITLPCRIKQIINMWQKVGKAMYAPPISGQIRCSSNIT 
GLLLTRDGGNSNNESEIFRPGGGDMRDNWRSELYKYKVVKIEPLGVAPTKAKRRWQR 
EKRAVGIGALFLGFLGAAGSTMGAASMTLTVQARQLLSGIVQQQNNLLRAIEAQQHLL 
QLTVWGIKQLQARILAVERYLKDQQLLGIWGCSGKLICTTAVPWNASWSNKSLEQIWN 
HTTWMEWDREINNYTSLIHSLIEESQNQQEKNEQELLELDKWASLWNWFNITNWLWYI 
KLFIMIVGGLVGLRIVFAVLSIWRWQGYSPLSFQTHLPTPRGPDRPEGIEEEGGER 
DRDRSIRLWGSLALIWDDLRSLCLFSYHRLRDLLLIVTRIVELLGRRGWEALKYWWN 
LLQYWSQELKNSAVSLLNATAIAVAEGTDRVIEWQGACRAIRHIPRRIRQGLERILL 
ii 

exon 8379 . . 8652 

/note="trs protein" 

/number=3 
exon 8379. .8424 

/note- "tat protein" 

/number =3 
CDS 8797 . . 9168 

/note="27K protein (premature termination) " 

/codon__start=l 

/protein__id="AAB50263 . 1" 

/db_xref ="GI : 1906386" 

/ 1 rans 1 a t i on= " MGGKWS KS S VIGWPTVRERMRRAEPAADRVGAASRDLEKHGAIT 
SSNTAATNAACAWLEAQEEEEVGFPVTPQVPLRPMTYKAAVDLSHFLKEKGGLEGLIH 
SQRRQDILDLWIYHTQGYFPD" 
LTR 9086 . . 9719 

/note="3 I LTR" 
repeat__region 9540.. 9636 

/note="R repeat 3* copy" 
polyA__signal 9612 . . 9617 

/note="HXB2 mRNA polyadenyation signal" 
BASE COUNT 3411 a 1772 C 2373 g 2163 t 

ORIGIN 

1 tggaagggct aattcactcc caacgaagac aagatatcct tgatctgtgg atctaccaca 
61 cacaaggcta cttccctgat tagcagaact acacaccagg gccagggatc agatatccac 
121 tgacctttgg atggtgctac aagctagtac cagttgagcc agagaagtta gaagaagcca 
181 acaaaggaga gaacaccagc ttgttacacc ctgtgagcct gcatggaatg gatgacccgg 
241 agagagaagt gttagagtgg aggtttgaca gccgcctagc atttcatcac atggcccgag 
3 01 agctgcatcc ggagtacttc aagaactgct gacatcgagc ttgctacaag ggactttccg 
3 61 ctggggactt tccagggagg cgtggcctgg gcgggactgg ggagtggcga gccctcagat 
421 cctgcatata agcagctgct ttttgcctgt actgggtctc tctggttaga ccagatctga 



exon 

intron 
CDS 
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481 gcctgggagc tctctggcta actagggaac ccactgctta agcctcaata aagcttgcct 
541 tgagtgcttc aagtagtgtg tgcccgtctg ttgtgtgact ctggtaacta gagatccctc 
601 agaccctttt agtcagtgtg gaaaatctct agcagtggcg cccgaacagg gacctgaaag 
661 cgaaagggaa accagaggag ctctctcgac gcaggactcg gcttgctgaa gcgcgcacgg 
721 caagaggcga ggggcggcga ctggtgagta cgccaaaaat tttgactagc ggaggctaga 
781 aggagagaga tgggtgcgag agcgtcagta ttaagcgggg gagaattaga tcgatgggaa 
841 aaaattcggt taagcjccagg gggaaagaaa aaatataaat taaaacatat agtatgggca 
901 agcagggagc tagaacgatt cgcagttaat cctggcctgt tagaaacatc agaaggctgt 
961 agacaaatac tgggacagct acaaccatcc cttcagacag gatcagaaga acttagatca 
1021 ttatataata cagtagcaac cctctattgt gtgcatcaaa ggatagagat aaaagacacc 
1081 aaggaagctt tagacaagat agaggaagag caaaacaaaa gtaagaaaaa agcacagcaa 
1141 gcagcagctg acacaggaca cagcaatcag gtcagccaaa attaccctat agtgcagaac 
1201 atccaggggc aaatggtaca tcaggccata tcacctagaa ctttaaatgc atgggtaaaa 
1261 gtagtagaag agaaggcttt cagcccagaa gtgataccca tgttttcagc attatcagaa 
1321 ggagccaccc cacaagattt aaacaccatg ctaaacacag tggggggaca tcaagcagcc 
13 81 atgcaaatgt taaaagagac catcaatgag gaagctgcag aatgggatag agtgcatcca 
1441 gtgcatgcag ggcctattgc accaggccag atgagagaac caaggggaag tgacatagca 
1501 ggaactacta gtacccttca ggaacaaata ggatggatga caaataatcc acctatccca 
1561 gtaggagaaa tttataaaag atggataatc ctgggattaa ataaaatagt aagaatgtat 
1621 agccctacca gcattctgga cataagacaa ggaccaaagg aaccctttag agactatgta 
1681 gaccggttct ataaaactct aagagccgag caagcttcac aggaggtaaa aaattggatg 
1741 acagaaacct tgttggtcca aaatgcgaac ccagattgta agactatttt aaaagcattg 
1801 ggaccagcgg ctacactaga agaaatgatg acagcatgtc agggagtagg aggacccggc 
1861 cataaggcaa gagttttggc tgaagcaatg agccaagtaa caaattcagc taccataatg 
1921 atgcagagag gcaattttag gaaccaaaga aagattgtta agtgtttcaa ttgtggcaaa 
1981 gaagggcaca cagccagaaa ttgcagggcc cctaggaaaa agggctgttg gaaatgtgga 
2 041 aaggaaggac accaaatgaa agattgtact gagagacagg ctaatttttt agggaagatc 
2101 tggccttcct acaagggaag gccagggaat tttcttcaga gcagaccaga gccaacagcc 
2161 ccaccagaag agagcttcag gtctggggta gagacaacaa ctccccctca gaagcaggag 
2221 ccgatagaca aggaactgta tcctttaact tccctcaggt cactctttgg caacgacccc 
2281 tcgtcacaat aaagataggg gggcaactaa aggaagctct attagataca ggagcagatg 
2341 atacagtatt agaagaaatg agtttgccag gaagatggaa accaaaaatg atagggggaa 
2401 ttggaggttt tatcaaagta agacagtatg atcagatact catagaaatc tgtggacata 
2461 aagctatagg tacagtatta gtaggaccta cacctgtcaa cataattgga agaaatctgt 
2521 tgactcagat tggttgcact ttaaattttc ccattagccc tattgagact gtaccagtaa 
2 581 aattaaagcc aggaatggat ggcccaaaag ttaaacaatg gccattgaca gaagaaaaaa 
2 641 taaaagcatt agtagaaatt tgtacagaga tggaaaagga agggaaaatt tcaaaaattg 
2 701 ggcctgaaaa tccatacaat actccagtat ttgccataaa gaaaaaagac agtactaaat 
2 761 ggagaaaatt agtagatttc agagaactta ataagagaac tcaagacttc tgggaagttc 
2 821 aattaggaat accacatccc gcagggttaa aaaagaaaaa atcagtaaca gtactggatg 
2881 tgggtgatgc atatttttca gttcccttag atgaagactt caggaagtat actgcattta 

2 941 ccatacctag tataaacaat gagacaccag ggattagata tcagtacaat gtgcttccac 

3 001 agggatggaa aggatcacca gcaatattcc aaagtagcat gacaaaaatc ttagagcctt 
3 061 ttagaaaaca aaatccagac atagttatct atcaatacat ggatgatttg tatgtaggat 
3121 ctgacttaga aatagggcag catagaacaa aaatagagga gctgagacaa catctgttga 
3181 ggtggggact taccacacca gacaaaaaac atcagaaaga acctccattc ctttggatgg 
3241 gttatgaact ccatcctgat aaatggacag tacagcctat agtgctgcca gaaaaagaca 
33 01 gctggactgt caatgacata cagaagttag tggggaaatt gaattgggca agtcagattt 
3361 acccagggat taaagtaagg caattatgta aactccttag aggaaccaaa gcactaacag 
3421 aagtaatacc actaacagaa gaagcagagc tagaactggc agaaaacaga gagattctaa 
3481 aagaaccagt acatggagtg tattatgacc catcaaaaga cttaatagca gaaatacaga 
3541 agcaggggca aggccaatgg acatatcaaa tttatcaaga gccatttaaa aatctgaaaa 
3 601 caggaaaata tgcaagaatg aggggtgccc acactaatga tgtaaaacaa ttaacagagg 
3 661 cagtgcaaaa aataaccaca gaaagcata'g taatatgggg aaagactcct aaatttaaac 
3 721 tgcccataca aaaggaaaca tgggaaacat ggtggacaga gtattggcaa gccacctgga 
3781 ttcctgagtg ggagtttgtt aatacccctc ccttagtgaa attatggtac cagttagaga 
3 841 aagaacccat agtaggagca gaaaccttct atgtagatgg ggcagctaac agggagacta 
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3901 aattaggaaa agcaggatat gttactaata 

3 961 acacaacaaa tcagaagact gagttacaag 

4 021 tagaagtaaa catagtaaca gactcacaat 
4081 atcaaagtga atcagagtta gtcaatcaaa 
4141 tctatctggc atgggtacca gcacacaaag 
42 01 tagtcagtgc tggaatcagg aaagtactat 
42 61 aacatgagaa atatcacagt aattggagag 
4321 tagtagcaaa agaaatagta gccagctgtg 
4 3 81 atggacaagt agactgtagt ccaggaatat 
4441 aagttatcct ggtagcagtt catgtagcca 
4501 cagaaacagg gcaggaaaca gcatattttc 
4561 aaacaataca tactgacaat ggcagcaatt 
4621 ggtgggcggg aatcaagcag gaatttggaa 
4681 tagaatctat gaataaagaa ttaaagaaaa 
4 741 atcttaagac agcagtacaa atggcagtat 
4 801 ttggggggta cagtgcaggg gaaagaatag 
4 8 61 aagaattaca aaaacaaatt acaaaaattc 
4921 gaaatccact ttggaaagga ccagcaaagc 
4 981 tacaagataa tagtgacata aaagtagtgc 
5041 atggaaaaca gatggcaggt gatgattgtg 
5101 tggaaaagtt tagtaaaaca ccatatgtat 
5161 agacatcact atgaaagccc tcatccaaga 
52 21 gatgctagat tggtaataac aacatattgg 
52 81 ttgggtcagg gagtctccat agaatggagg 
5341 gaactagcag accaactaat tcatctgtat 
54 01 agaaaggcct tattaggaca catagttagc 
5461 aaggtaggat ctctacaata cttggcacta 
5521 ccacctttgc ctagtgttac gaaactgaca 
5581 aagggccaca gagggagcca cacaatgaat 
5641 atgaagctgt tagacatttt cctaggattt 
57 01 aaacttatgg ggatacttgg gcaggagtgg 
5761 tgtttatcca ttttcagaat tgggtgtcga 
5821 agagcaagaa atggagccag tagatcctag 
5881 gcctaaaact gcttgtacca attgctattg 
5941 tttcataaca aaagccttag gcatctccta 
6001 agctcatcag aacagtcaga ctcatcaagc 
6061 aacgcaacct ataccaatag tagcaatagt 
6121 agttgtgtgg tccatagtaa tcatagaata 
6181 caggttaatt gatagactaa tagaaagagc 
6241 aatatcagca cttgtggaga tgggggtgga 

63 01 tgatctgtag tgctacagaa aaattgtggg 
6361 aggaagcaac caccactcta ttttgtgcat 
6421 ataatgtttg ggccacacat gcctgtgtac 

64 81 tggtaaatgt gacagaaaat tttaacatgt 
6541 aggatataat cagtttatgg gatcaaagcc 
6601 gtgttagttt aaagtgcact gatttgaaga 
6661 gaatgataat ggagaaagga gagataaaaa 
6721 gaggtaaggt gcagaaagaa tatgcatttt 
6781 atgatactac cagctataag ttgacaagtt 
6841 caaaggtatc ctttgagcca attcccatac 
6901 taaaatgtaa taataagacg ttcaatggaa 
6961 aatgtacaca tggaattagg ccagtagtat 
7021 cagaagaaga ggtagtaatt agatctgtca 
70 81 tacagctgaa cacatctgta gaaattaatt 
7141 gaatccgtat ccagagagga ccagggagag 
72 01 tgagacaagc acattgtaac attagtagag 
7261 ctagcaaatt aagagaacaa tttggaaata 




gaggaagaca aaaagttgtc accctaactg 
caatttatct agctttgcag gattcgggat 
atgcattagg aatcattcaa gcacaaccag 
taatagagca gttaataaaa aaggaaaagg 
gaattggagg aaatgaacaa gtagataaat 
ttttagatgg aatagataag gcccaagatg 
caatggctag tgattttaac ctgccacctg 
ataaatgtca gctaaaagga gaagccatgc 
ggcaactaga ttgtacacat ttagaaggaa 
gtggatatat agaagcagaa gttattccag 
ttttaaaatt agcaggaaga tggccagtaa 
tcaccggtgc tacggttagg gccgcctgtt 
ttccctacaa tccccaaagt caaggagtag 
ttataggaca ggtaagagat caggctgaac 
tcatccacaa ttttaaaaga aaagggggga 
tagacataat agcaacagac atacaaacta 
aaaattttcg ggtttattac agggacagca 
tcctctggaa aggtgaaggg gcagtagtaa 
caagaagaaa agcaaagatc attagggatt 
tggcaagtag acaggatgag gattagaaca 
gtttcaggga aagctagggg atggttttat 
ataagttcag aagtacacat cccactaggg 
ggtctgcata caggagaaag agactggcat 
aaaaagagat atagcacaca agtagaccct 
tactttgact gtttttcaga ctctgctata 
cctaggtgtg aatatcaagc aggacataac 
gcagcattaa taacaccaaa aaagataaag 
gaggatagat ggaacaagcc ccagaagacc 
ggacactaga gcttttagag gagcttaaga 
ggctccatgg cttagggcaa catatctatg 
aagccataat aagaattctg caacaactgc 
catagcagaa taggcgttac tcgacagagg 
actagagccc tggaagcatc caggaagtca 
taaaaagtgt tgctttcatt gccaagtttg 
tggcaggaag aagcggagac agcgacgaag 
ttctctatca aagcagtaag tagtacatgt 
agcattagta gtagcaataa taatagcaat 
taggaaaata ttaagacaaa gaaaaataga 
agaagacagt ggcaatgaga gtgaaggaga 
gatggggcac catgctcctt gggatgttga 
tcacagtcta ttatggggta cctgtgtgga 
cagatgctaa agcatatgat acagaggtac 
ccacagaccc caacccacaa gaagtagtat 
ggaaaaatga catggtagaa cagatgcatg 
taaagccatg tgtaaaatta accccactct 
atgatactaa taccaatagt agtagcggga 
actgctcttt caatatcagc acaagcataa 
tttataaact tgatataata ccaatagata 
gtaacacctc agtcattaca caggcctgtc 
attattgtgc cccggctggt tttgcgattc 
caggaccatg tacaaatgtc agcacagtac 
caactcaact gctgttaaat ggcagtctag 
atttcacgga caatgctaaa accataatag 
gtacaagacc caacaacaat acaagaaaaa 
catttgttac aataggaaaa ataggaaata 
caaaatggaa taacacttta aaacagatag 
ataaaacaat aatctttaag caatcctcag 
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7321 
7381 
7441 
7501 
7561 
7621 
7681 
7741 
7801 
7861 
7921 
7981 
8041 
8101 
8161 
8221 
8281 
8341 
8401 
8461 
8521 
8581 
8641 
8701 
8761 
8821 
8881 
8941 
9001 
9061 
9121 
9181 
9241 
9301 
9361 
9421 
9481 
9541 
9601 
9661 



gaggggaccc 
attcaacaca 
ataacactga 
tgtggcagaa 
catcaaatat 
agatcttcag 
ataaagtagt 
tgcagagaga 
caggaagcac 
ctggtatagt 
tgcaactcac 
acctaaagga 
ctgctgtgcc 
cgacctggat 
ttgaagaatc 

gggcaagttt 
taatgatagt 
atagagttag 
gacccgacag 
ttcgattagt 
tcagctacca 
gacgcagggg 
aactaaagaa 
cagatagggt 
gaataagaca 
agtgtgattg 
agggtgggag 
gcagctacca 
ccagtcacac 
cactttttaa 
atccttgatc 
ccagggccag 
gagccagata 
agcctgcatg 
ctagcatttc 
cgagcttgct 
actggggagt 
gtctctctgg 
gcttaagcct 
tgactctggt 



agaaattgta 
actgtttaat 
aggaagtgac 
agtaggaaaa 
tacagggctg 
acctggagga 
aaaaattgaa 
aaaaagagca 
tatgggcgca 
gcagcagcag 
agtctggggc 
tcaacagctc 
ttggaatgct 
ggagtgggac 
gcaaaaccag 
gtggaattgg 
aggaggcttg 
gcagggatat 
gcccgaagga 
gaacggatcc 
ccgcttgaga 
gtgggaagcc 
tagtgctgtt 
tatagaagta 
gggcttggaa 
gatggcctac 
cagcatctcg 
atgctgcttg 
ctcaggtacc 
aagaaaaggg 
tgtggatcta 
gggtcagata 
agatagaaga 
ggatggatga 
atcacgtggc 
acaagggact 
ggcgagccct 
ttagaccaga 
caataaagct 
aactagagat 



acgcacagtt 
agtacttggt 
acaatcaccc 
gcaatgtatg 
ctattaacaa 
ggagatatga 
ccattaggag 

gtggg^atag 

gcctcaatga 
aacaatttgc 
atcaagcagc 
ctggggattt 
agttggagta 
agagaaatta, 
caagaaaaga 
tttaacataa 
gtaggtttaa 
tcaccattat 
atagaagaag 
ttggcactta 
gacttactct 
ctcaaatatt 
agcttgctca 
gtacaaggag 
aggattttgc 
tgtaagggaa 
agacctggaa 
tgcctggcta 
tttaagacca 
gggactggaa 
ccacacacaa 
tccactgacc 
ggccaataaa 
cccggagaga 
ccgagagctg 
ttccgctggg 
cagatcctgc 
tctgagcctg 
tgccttgagt 
ccctcagacc 



ttaattgtgg 
ttaatagtac 
tcccatgcag 
cccctcccat 
gagatggtgg 
gggacaattg 
tagcacccac 
gagctttgtt 
cgctgacggt 
tgagggctat 
tccaggcaag 
ggggttgctc 
ataaatctct 
acaattacac 
atgaacaaga 
caaattggct 
gaatagtttt 
cgtttcagac 
aaggtggaga 
tctgggacga 
tgattgtaac 
ggtggaatct 
atgccacagc 
cttgtagagc 
tataagatgg 
agaatgagac 
aaacatggag 
gaagcacaag 
atgacttaca 
gggctaattc 
ggctacttcc 
tttggatggt 
ggagagaaca 
gaagtgttag 
catccggagt 
gactttccag 
atataagcag 
ggagctctct 
gcttcaagta 
cttttagtca 



a ggggaattt 

ttggagtact 
aataaaacaa 
cagtggacaa 
taatagcaac 
gagaagtgaa 
caaggcaaag 
ccttgggttc 
acaggccaga 
tgaggcgcaa 
aatcctggct 
tggaaaactc 
ggaacagatt 
aagcttaata 
attattggaa 
gtggtatata 
tgctgtactt 
ccacctccca 
gagagacaga 
tctgcggagc 
gaggattgtg 
cctacagtat 
catagcagta 
tattcgccac 
gtggcaagtg 
gagctgagcc 
caatcacaag 
aggaggagga 
aggcagctgt 
actcccaaag 
ctgattagca 
gctacaagct 
ccagcttgtt 
agtggaggtt 
acttcaagaa 
ggaggcgtgg 
ctgctttttg 
ggctaactag 
gtgtgtgccc 
gtgtggaaaa 



ttctactgta 
gaagggtcaa 
attataaaca 
attagatgtt 
aatgagtccg 
ttatataaat 
agaagagtgg 
ttgggagcag 
caattattgt 
cagcatctgt 
gtggaaagat 
atttgcacca 
tggaatcaca 
cactccttaa 
ttagataaat 
aaattattca 
tctatagtga 
accccgaggg 
gacagatcca 
ctgtgcctct 
gaacttctgg 
tggagtcagg 
gctgagggga 
atacctagaa 
gtcaaaaagt 
agcagcagat 
tagcaataca 
ggtgggtttt 
agatcttagc 
aagacaagat 
gaactacaca 
agtaccagtt 
acaccctgtg 
tgacagccgc 
ctgctgacat 
cctgggcggg 
cctgtactgg 
ggaacccact 
gtctgttgtg 
tctctagca 



The oligonucleotide probe for Ap-1 sequence in step 63 was synthesized using PMA 
responsive element as consensus sequence as indicated by the reference of 
Northrop et al . , 1993, and adding flanking sequences. 

These two sequences which were used as probes are representative example to 
demonstrate the methodology of DNA-protein interaction. Any other relevant 
sequence (s) can be used for this purpose. 
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